7 research outputs found
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
In recent years, machine learning models have rapidly become better at
generating clinical consultation notes; yet, there is little work on how to
properly evaluate the generated consultation notes to understand the impact
they may have on both the clinician using them and the patient's clinical
safety. To address this we present an extensive human evaluation study of
consultation notes where 5 clinicians (i) listen to 57 mock consultations, (ii)
write their own notes, (iii) post-edit a number of automatically generated
notes, and (iv) extract all the errors, both quantitative and qualitative. We
then carry out a correlation study with 18 automatic quality metrics and the
human judgements. We find that a simple, character-based Levenshtein distance
metric performs on par if not better than common model-based metrics like
BertScore. All our findings and annotations are open-sourced.Comment: To be published in proceedings of ACL 202
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
The authors would like to thank Rachel Young and Tom Knoll for supporting the team and hiring the evaluators, Vitalii Zhelezniak for his advice on revising the paper, and Kristian Boda for helping to set up the Stanza+Snomed fact-extraction system.Publisher PD
User-driven development of a medical note generation system
A growing body of work uses Natural Language Processing (NLP) methods to automatically generate medical notes from audio recordings of doctor-patient consultations.
However, there are very few studies on how
such systems could be used in clinical practice,
how clinicians would adjust to using them, or
how system design should be influenced by
such considerations. In this paper, we present
three rounds of user studies, carried out in the
context of developing a medical note generation system. We present, analyse and discuss
the participating clinicians’ impressions and
views of how the system ought to be adapted
to be of value to them. Next, we describe a
three-week test run of the system in a live telehealth clinical practice. Major findings include
(i) the emergence of five different note-taking
behaviours; (ii) the importance of the system
generating notes in real time during the consultation; and (iii) the identification of a number
of clinical use cases that could prove challenging for automatic note generation systems
User-Driven Research of Medical Note Generation Software
A growing body of work uses Natural Language Processing (NLP) methods to
automatically generate medical notes from audio recordings of doctor-patient
consultations. However, there are very few studies on how such systems could be
used in clinical practice, how clinicians would adjust to using them, or how
system design should be influenced by such considerations. In this paper, we
present three rounds of user studies, carried out in the context of developing
a medical note generation system. We present, analyse and discuss the
participating clinicians' impressions and views of how the system ought to be
adapted to be of value to them. Next, we describe a three-week test run of the
system in a live telehealth clinical practice. Major findings include (i) the
emergence of five different note-taking behaviours; (ii) the importance of the
system generating notes in real time during the consultation; and (iii) the
identification of a number of clinical use cases that could prove challenging
for automatic note generation systems.Comment: Accepted for publication at NAACL 202